Skip to content
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

Add combinations #1

Closed
wants to merge 11 commits into from
Closed

Add combinations #1

wants to merge 11 commits into from

Conversation

Jayd-1234
Copy link
Collaborator

Added preliminary version of combinations. It has been tested with different JaggedArray inputs, and was working as expected. The tests will be given shortly.

Things that are to be done:

  • Add more case checking
  • Add docstring describing it's usage
  • Add support for other array types ( not only JaggedArray)

Added preliminary version of combinations
@Jayd-1234 Jayd-1234 added the enhancement New feature or request label Jun 28, 2018
@Jayd-1234 Jayd-1234 requested a review from jpivarski June 28, 2018 13:34
@Jayd-1234 Jayd-1234 requested a review from davidlange6 June 28, 2018 13:47
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Put the import awkward.array.table (fully qualified) inside the function so that there isn't a dependency order among array types. Also, use the fully qualified name throughout (awkward.array.table.Table rather than Table).
  • Remove vectorized_parents. User-facing methods won't say how they're implemented: we would either replace regular parents with this or not. I've already done performance tests: regular parents is faster for Numpy, so we keep that. This vectorized algorithm would be the right thing to do on a GPU (a future awkward.gpu acceleration package), but not for regular Numpy.
  • Remove return_indices as an argument; this method should be named aproduct and always return indexes (like amin and amax). There should be another method named product that calls this method and applies the indexes to get values. Moreover, I've established a style in which we spell it "indexes" (English allows both, so we have to pick one for better search-and-replace).
  • Remove the writeable argument; it should inherit self._writeable.
  • The first jagged array is self, so that this can be used as a method. It should be a red flag when a method never uses self. The second jagged array should be named other, a convention I've established. You only need to check the type of other (which would exclude the None case, no need for an explicit check).
  • Minimize the introduction of unnecessary names. For instance NUMEVENTS (which, as a name, would be confusing outside of a physics context) is just len(self), which is short enough to use as an idiom. Moreover, it's only used once. Introducing names adds cognitive burden, making code harder to understand, especially when they're not meaningful names. Reducing the number of names would also help you avoid underscored_names.
  • The dtype should be self.INDEXTYPE. It's centrally managed in the base class.
  • Use spaces around arithmetic operations and after commas for clarity. (I use spaces around + and - but sometimes not * and / as a reminder of order of operations.)
  • Inside a JaggedArray method, use the private member names: _content rather than content.

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Hide parents_from_offsets inside one of these methods (so that the user doesn't see it) or just inline the loop because it's only used in one place.
  • Don't copy-paste huge blocks of code! product uses the output of argproduct, so have it call argproduct to extract what it needs, rather than reimplement it. Even the sanity checks (ValueErrors) don't have to be re-checked: if product calls argproduct, argproduct does all the necessary checks.
  • Don't convert arrays with astype if you don't have to. numpy.arange takes a dtype argument, so use that.

starts2 = other.starts
stops2 = other.stops
counts2 = stops2 - starts2

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check that counts1.shape == counts2.shape?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't be necessary, as we are doing shape checking for starts array. The shape of counts and starts are same.

@@ -406,6 +407,56 @@ def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
return None
else:
return JaggedArray(starts, stops, result)


def argproduct(self, other):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a small bit of documentation about what this function does.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check it out. I have added simple docstrings describing the usage

pairs_counts = numpy.zeros(len(starts1)+1, dtype=self.INDEXTYPE)
pairs_counts[1:] = numpy.cumsum(counts1*counts2, dtype=self.INDEXTYPE)

def parents_from_offsets(offsets, content):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

content is needed in this function only to get the length of the output array. This is almost encoded into offsets. I wonder if a simpler solution could be used? Eg, to either pass in len(content) or to append the N+1 offset of the offsets array (which may be useful for other reasons)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(actually my second suggestion looks to be already the case - eg, can I not do

out=numpy.full(offsets[-1],-1, dtype=selfINDEXTYPE)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you are right! We need the length only. I will do it by tomorrow morning ( travelling now).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

- Added parents without using content
- Added simple docstrings regarding usage.
@jpivarski
Copy link
Member

I had to make some corrections to the conventions (e.g. "offsets" we being named "counts") and fixed a bug (argproduct Table's length was set to pairs_indexes[-1], which is one too few). Also removed redundant implementations.

However, now that I've pushed to what I thought was this pull request, I don't see my commit showing up here. Is it on your fork?

@jpivarski jpivarski mentioned this pull request Jul 12, 2018
@jpivarski
Copy link
Member

Superseded by pull request #2 (which has all of these commits, plus one from me).

@jpivarski jpivarski closed this Jul 12, 2018
@Jayd-1234
Copy link
Collaborator Author

Jayd-1234 commented Jul 12, 2018

@jpivarski Thanks! That was a sneaky little bug..

I don't see the commit on my fork. Maybe it was closed?

@jpivarski jpivarski mentioned this pull request Apr 2, 2019
EscottC added a commit that referenced this pull request Jul 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants